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Abstract 

We investigate high frequency price dynamics in foreign exchange market using data from Reuters 
information system (the dataset has been provided to us by Olsen & Associates). In our analysis 
we show that a naive approach to the definition of price (for example using the spot midprice) may 
lead to wrong conclusions on price behavior as for example the presence of short term covariances 
for returns. 

For this purpose we introduce an algorithm which only uses the non arbitrage principle to 
estimate real prices from the spot ones. The new definition leads to returns which are i.i.d. variables 
and therefore are not affected by spurious correlations. Furthermore, any apparent information 
(defined by using Shannon entropy) contained in the data disappears. 
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I. INTRODUCTION 



A foreign exchange market is an over the counter (OTC) market not subject to any time 
restriction, in fact, it is open 24 hours a day. Given also that it is the most liquid market 
in the world and the availability of tick-by-tick quotes, foreign exchange market is very 
convenient for the study of high frequency behaviors. 

Foreign exchange market is made up of about 2000 financial institutions around the globe 
which operates by selling or buying certain amount of a given currency. A market maker 
(any of the financial institutions which make the market) is expected to quote simultaneously 
for his customers both a bid and a ask price at which he is willing to sell and buy a standard 
amount of a given currency. Each of the major market makers shows a running list of its 
main bid and ask rates, and those rates are displayed to all market participants. In principle 
each rate from each market maker is valid until a new rate is displayed by the same market 
maker. In practice, this is not the case and no information is given about the lifetime of 
each quote. 

In analyzing recorded financial data a difficult and puzzling problem is to define 

which is the real asset price. In principle, three different quotes for the asset are available: 
bid, ask and traded price (the price at which the transaction is actually made). Using a 
wrong definition for asset price can lead to wrong evaluation of price dynamics. For example, 
if the traded price is used to analyze price dynamics a random zero mean oscillation around 
the real price will be found at very short time scale. 

We analyze the DEM/USD exchange rates taken from Reuters' EFX pages (the dataset 
has been provided to us by Olsen & Associates) during a period of one year from January to 
December 1998. In this period 1,620,843 quotes entries in the EFX system were recorded. 
The dataset provides a continuously updated sequence of bid and ask exchange rate quota- 
tion pairs from individual institutions whose names and locations are also recorded. EFX 
dataset does not contain any information on traded volume and on the lifetime of quotes. 
Furthermore EFX quotes are indicative and they do not imply that any amount of currency 
has been actually traded. 

The aim of this work is to find the best definition for the asset price. We start analyzing 
raw data assuming that the asset price is simply given by spot quotes. We find that this 
lead to an indeterminacy of asset price at very short time scale and to spurious correlations 
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for returns. We investigate one possible explanation assuming that spot quotes contain an 
estimation error made by the market maker on the real price. In this way we do not find 
the real price but then we introduce an algorithm which, reducing the spread between bid 
and ask quotes, is able to determine the real price and solve the indeterminacy. In the last 
chapter we use information theory to strengthen our results. The key of our work is that 
we are able to determine the real price with a parameter free algorithm which uses only the 
non-arbitrage principle. 

II. A NAIVE APPROACH TO THE STUDY OF FX MICROSTRUCTURE 

The aim of this section is to show that a naive approach to the analysis of foreign exchange 
market may lead to wrong conclusions on price dynamics. 

We analyze data taken from EFX Reuters' information system of DEM/USD exchange 
quotes of the entire year 1998. In the dataset each bid ad ask quotes as given by the market 
operators are recorded. The dataset does not contain information on trading prices or on 
volumes of currencies traded but only tick-by-tick exchange rates. Prices are irregularly 
time-spaced and we decided, instead of sampling the data in some arbitrarily fixed sampling 
time, to use business time as our time flow index (see for an exhaustive investigation of 
the problem). According to our choice t takes all integer values up to N which is the number 
of quotes in the dataset. 

We indicate with and respectively bid and ask quotes at time t. For our analysis 
we consider spot price as given by the average of bid and ask quotes St = (£ t + £ t )/2. We 
stress that this choice for the spot price is not stringent, the same results can be obtained if 
bid or ask quotes are used. 

We define return at two consecutive business time as: 



and, in general, returns at time t and lag r as 




(2) 



We estimated using the above cited dataset the r dependent variance of returns: 




(3) 
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the neighboring covariance of two consecutive returns after r lags 

< r t+T (r)r t (T) > (4) 

and the non-overlapping covariances of returns 

< r t+T+s (T)r t (r) > (5) 

where s > 1. In the three definitions < • > indicates an average over the probability 
distribution. Results are shown in figure ^ The variance of returns is a linear function of 
time lags r, as expected, but it is different from zero in the limit r — > 0. This imply the 
existence of an implicit indeterminacy in the price estimation for vanishing time lags. The 
same indeterminacy is responsible for the negative covariance of two consecutive returns (see 
below) . 

In order to explain the previous facts, it has been suggested that the spot price is the 
composition of two different stochastic processes: a real price change and a noise contribution 
which is the result of erroneous evaluations of the real price by the market operators. 

Given that St is the spot price at business time t we can express the two contributions 

as: 

S t = Ste €t (6) 

where S t is the real price and e t is the error contribution to the real price (e t = ln(St/S t ), f t = 
\n(St+ T / St))- The relation between returns is then given by: 

r t = h - e t + e t+ x. (7) 

In this framework we can explain the behavior of the variance and of the other quantities 
reported in figure [T] In fact, with the above definitions, the r dependent variance can be 
calculated analytically: 

< rf (r) >= 2 < e\ > + < f\ > r. (8) 

where it has been assumed that et and r t are uncorrelated i.i.d. random variables. The 
neighboring covariance of two consecutive returns after r business time 

< r t+T {T)r t {T) >= - < t\ > (9) 

and the non-overlapping covariances of returns 

< r t+T+s (r)r t (r) >= (10) 



The above picture corresponds exactly to what one can see in figure H Therefore, it can 
be estimated the experimental value for < > which is (2.0 ± 0.2) x 10 -8 and < ff >= 
(0.64 ± 0.05) x 10 -8 for the particular dataset analyzed. We stress that equtions |S1 and El 
give two independent estimation of the variance allocated in the error contribution. We find 
that the two values, computed from data of figure Q coincide within errors. 

In order to complete our picture we also estimated the covariance function on time inter- 
vals r, defined as 

< r t+T r t > (11) 

where we considered < r t >= 0. Results are plotted in figure 121 The figure shows that the 
spot returns are one step negatively correlated (< r t+1 r t >= — < >) while for r > 1 we 
have < r t + T r t >= according to previous findings [5j. 



III. A MORE REALISTIC APPROACH 



The aim of this work is to find a possible algorithm which is able to separate the two 
contributions in the spot price. The algorithm should be able to solve the indeterminacy 
found when the spot price is used to analyze high frequency price dynamics. From the 
previous paragraph we have constraints on the variance allocated in the real price and in 
the error distribution, the algorithm should then take this constraints into account. 

In DEM/USD 1998 dataset, each quote at each business time is associated with the 
financial institution which fixed that quote. In principle this quote should be valid until the 
same bank gives a different exchange rate (both for bid and ask prices). In practice between 
two different quotes from the same bank there are several quotes fixed by other institutions 
around the world. This suggests that a bank quote elapses after a certain time even if a 
new quote has not been fixed by the same bank. If the dataset contained information on 
the time duration of each quote there would be no problem in establishing real price at each 
time: it would be the best bid and ask quotes valid at that time. But this information is 
not available and a different strategy has to be found to establish real price at each time. 

The algorithm we propose is the following: let us suppose that we are observing the bid 
and ask price of a given currency and that we are able to detect each quotes from all the 
financial institution in the business time t. 

We define distance between bid and ask as: D t = — . Notice that for the non- 
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arbitrage principle this quantity is greater than or equal to zero. Considering k time lags 
previous to business time t we consider the following distance: 

A,fe= min sf> - max ^ a) . (12) 

ie{t-k,t} i£{t-k,t} 

For each t our algorithm find k which gives D t ^ > and D t ^ +1 < 0. The real price is 
then given by 

(min i6{t _fc t} Sf } + max ig{t _ I t} S 4 (a) ) 
S t = ± L (13) 

In this way we can then define a currency quote at each time. Notice that the number of 
steps we have to go backwards in time is only given by the non-arbitrage principle and it 
is different for every t. Once we have obtained St we can define f t {j) = ln(S t + T / S t ) and 
compute all quantities (variances and correlations) already computed for the naive price 
definition. 

As stated above if our algorithm is correct we should have that the indeterminacy con- 
tained in the spot price is removed for the real price. We then replicate the analysis described 
in the first paragraph for the spot price using the above defined real price St. Results for 
this analysis are presented in figure El It can be seen that the variance of returns goes to 
zero when business time goes to zero, in fact the experimental value of < e 2 > in equation |H] 
for the real price is (3 ± 1) x 10~ 10 , two order of magnitude smaller than for the spot price. 
Also the neighboring covariance of two consecutive returns goes to zero. Another interesting 
results is that we obtain for the real returns variance a value (< r\ >= 0.64 x 10~ 8 ) which 
is identical, within error, to the one predicted in equation |H1 

If we estimate the covariance of returns as defined in equation we obtain that the real 
price returns are uncorrelated at every step (see figure 121 where covariance is compared with 
that of 'naive returns' given in equation ITTj). 

The idea we have used here is indeed very simple, we assume that old quotes are still valid 
until they produce arbitrage. In spite of the simplicity we are able to remove all artifacts in 
the data. 

IV. INFORMATION ANALYSIS 

To be able to perform information analysis on our dataset first of all we need to code 

n 

the original data in a sequence of symbols [6J. There are several way to build up such a 



sequence: one should make sure that this treatment does not change to much the structure 
of the process underlying the evolution of financial data. A partition process of the range of 
variability of the data is needed in order to assign a conventional symbol to each element of 
the partition. A symbol corresponds then unambiguously to each element of the partition. 
The procedure described below permits to code financial data in a sequence of binary symbols 
from which is then possible to quantify available information. 
We fix a resolution value A and define 

ru(r) = In ^ (14) 

where ti is a given business time. We wait until an exit time Ti such as 

Mts)|>A (15) 

In this way we only consider market fluctuations of amplitude A. We can build up a sequence 
of r ti {ji), where t 1 — t + r and t i+ i =t i + Tj, then we code this sequence in a binary code 
according to the following rules: 

/ -1 if r ti (n)<0 
c k = < (16) 
[+1 if r u (T i )>0 

The procedure described above corresponds to a patient investor who waits to update his in- 
vesting strategy until a certain behavior of the market is achieved, for example, a fluctuation 
of size A. 

Once we have build a symbolic sequence we can estimate the entropy which is defined, 
for a generic sequence of n symbols, as: 

H n = -J2p(C n )\np(C n ) (17) 

where C n = {c± . . . c n } is a sequence of n objects and p(C n ) its probability. The difference 

h n = H n+1 - H n (18) 

represents the average information needed to specify the symbol c n+ \ given the previous 
knowledge of the sequence {c± . . . c n }. 

The series h n is monotonically not increasing and for an ergodic process one has 

h = lim hn (19) 

n— >oo 



where h is the Shannon entropy |7fl. It can be shown that if the stochastic process {ci . . . c n } 
is markovian of order k (i.e. the probability of having c n at time n depends only on previous 
k steps n — 1, n — 2, . . . , n — k), then h n = h for n > k. In other cases either h n goes to zero 
for increasing n, which means that for n sufficiently large the (n + l)th-symbol is predictable 
knowing the sequence C n , or it tends to a positive finite value. The maximum value of h 
is ln(2) for a dichotomic sequence. It occurs if the process has no memory at all and the 
2 symbols have the same probability. The difference between ln(2) and h is intuitively the 
quantity of information we may use to predict the next result of the phenomenon we observe, 
i.e. the market behavior. 

In figure 0]/i n is estimated both for real (S t ) and spot prices {S t ). From the results it is 
obvious the different behaviors of the two definition for currency price. In fact while for the 
spot price we find a non zero available information (In 2 — h n ^ 0), the stochastic process 
is a Markov process of order 1, the real price does not show this behavior. The available 
information for the real price is zero and it remains zero at every step (due to the finite 
number of data we can only estimate h n until n ~ 9 but we can extrapolate its behavior 
for n — > oo). This show that the real price (unfortunately) is a stochastic process with no 
memory and predictability. 

V. CONCLUSIONS 

The aim of this work is to find the exact way to extract real prices from quotes taken 
form Reuters' Information system. Our dataset containes 1,620,843 bid and ask DEM/USD 
quotes recorded during the entire year 1998, from the 1st of January until the 31st of 
December 1998. 

In section 2 we show that a wrong behavior of price dynamics can be obtained when 
the raw dataset is naively processed. In fact, one finds an implicit indeterminacy in price 
specification which increases the volatility and produces spurious covariances. We then 
explain this indeterminacy by means of an error contribution which is responsible for the 
increased volatility and for the covariances. 

At this point we introduce a parameter free algorithm, only based on the non arbitrage 
principle, which is able to extract the real prices from the spot ones. The correctness of the 
procedure is corroborated by the many results presented in this work. First of all we show 
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that with the new price definition the indeterminacy and the one step anti-correlation drop 
to zero. We also show, through information analysis, that the stochastic process for the new 
defined price has no short range memory. 

Given our results we think that when studying price dynamics a strong attention has 
to be posed on the definition of prices to be used in the analysis in order to avoid wrong 
conclusions as, for example, the existence of short term return correlations. 

We stress that we are able to define real prices directly from spot quotes without the 
need of further information (time of validity of quotes) as one could obtain by means of an 
electronic broking system jgj]. 

In conclusion we would like to propose our method as a general tool to process raw dataset 
in order to obtain a new dataset of the same length whose data are a better representation 
of price evolution in the very short time scale. 
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FIG. 1: DEM/USD spot exchange rates: variance ©(crosses) compared with a linear fit 2A+Br, 
neighboring covariance Q (circles) compared with -A, non-overlapping covariance ©(stars) com- 
pared with zero. A and B are identified with < > and < ff >. 
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FIG. 2: Covariance < rtr t + T > and < r t r t + T > for spot (squares) and real (circles) returns. 
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FIG. 3: DEM/USD real exchange rates: variance ©(crosses) compared with a linear fit Br, neigh- 
boring covariance (JU (circles) compared with zero, non-overlapping covariance ©(stars) compared 
with zero, B is identified with < f 1 > 
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FIG. 4: Information for spot (squares) and real (circles) prices 
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